Bayesian optimistic Kullback–Leibler exploration

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Multi-Scale Optimistic Optimization

where σ T (x) = κ(x,x) − k1:T (x)K−1k1:T (x) and this bound is tight. Moreover, σ T (x) is the posterior predictive variance of a Gaussian process with the same kernel. Lemma 3 (Adapted from Proposition 1 of de Freitas et al. (2012)). Let κ : R × R → R be a kernel that is twice differentiable along the diagonal {(x,x) |x ∈ RD}, with L defined as in Lemma 1.1, and f be an element of the RKHS wit...

متن کامل

Optimistic Simulated Exploration as an Incentive for Real Exploration

Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an optimistic model to discover promising paths for real exploration. This reduces the needs for the real exploration.

متن کامل

Optimistic Bayesian Sampling in Contextual-Bandit Problems

In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to...

متن کامل

RNA-Seq Bayesian Network Exploration of Immune System in Bovine

Background: The stress is one of main factors effects on production system. Several factors (both genetic and environmental elements) regulate immune response to stress. Objectives: In order to determine the major immune system regulatory genes underlying stress responses, a learning Bayesian network approach for those regulatory genes was applied to RNA-...

متن کامل

Model based Bayesian Exploration

Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefitof exploration can be estimated using the classical notion of Value of Information — the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2018

ISSN: 0885-6125,1573-0565

DOI: 10.1007/s10994-018-5767-4